Zero-shot domain paraphrase with unaligned pre-trained language models

نویسندگان

چکیده

Abstract Automatic paraphrase generation is an essential task of natural language processing. However, due to the scarcity corpus in many languages, Chinese, for example, generating high-quality paraphrases these languages still challenging. Especially domain paraphrasing, it even more difficult obtain in-domain sentence pairs. In this paper, we propose a novel approach domain-specific zero-shot fashion. Our based on sequence-to-sequence architecture. The encoder uses pre-trained multilingual autoencoder model, and decoder monolingual autoregressive model. Because two models are separately, they have different representations same token. Thus, call them unaligned models. We train model with English-to-Chinese machine translation corpus. Then, by inputting Chinese into could surprisingly generate fluent diverse paraphrases. Since inconsistent understandings language, believe that paraphrasing actually performed Chinese-to-Chinese manner. addition, collect small-scale computer science. By fine-tuning corpus, our shows excellent capability domain-paraphrasing. Experiment results show significantly outperforms previous baselines regarding Relevance, Fluency, Diversity.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Zero-Shot Deep Domain Adaptation

Current state-of-the-art approaches in domain adaptation and fusion show promising results with either labeled or unlabeled task-relevant target-domain training data. However, the fact that the task-relevant target-domain training data can be unavailable is often ignored by the prior works. To tackle this issue, instead of using the task-relevant target-domain training data, we propose zeroshot...

متن کامل

ImageNet pre-trained models with batch normalization

Convolutional neural networks (CNN) pre-trained on ImageNet are the backbone of most state-of-the-art approaches. In this paper, we present a new set of pretrained models with popular state-of-the-art architectures for the Caffe framework. The first release includes Residual Networks (ResNets) with generation script as well as the batch-normalization-variants of AlexNet and VGG19. All models ou...

متن کامل

Zero-shot Cross Language Text Classifica-

Labeled text classification datasets are typically only available in a few select languages. In order to train a model for e.g news categorization in a language Lt without a suitable text classification dataset there are two options. The first option is to create a new labeled dataset by hand, and the second option is to transfer label information from an existing labeled dataset in a source la...

متن کامل

Hashing in the zero shot framework with domain adaptation

Techniques to learn hash codes which can store and retrieve large dimensional multimedia data efficiently, have attracted broad research interests in the recent years. With rapid explosion of newly emerged concepts and online data, existing supervised hashing algorithms suffer from the problem of scarcity of ground truth annotations due to high cost of obtaining manual annotations. Therefore, w...

متن کامل

Zero-Shot Learning for Natural Language Understanding Using Domain-Independent Sequential Structure and Question Types

Natural language understanding (NLU) is an important module of spoken dialogue systems. One of the difficulties when it comes to adapting NLU to new domains is the high cost of constructing new training data for each domain. To reduce this cost, we propose a zero-shot learning of NLU that takes into account the sequential structures of sentences together with general question types across diffe...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

ژورنال

عنوان ژورنال: Complex & Intelligent Systems

سال: 2022

ISSN: ['2198-6053', '2199-4536']

DOI: https://doi.org/10.1007/s40747-022-00820-8